Introduction

In this short workbook, we outline how using R we can visualise and compare the various computations of the Palau TFR overtime. The data source is the World Population Prospects, United Nations (2022)

Code

Below is the code used to build the scatter plot comparison.

Loading the Packages

First we load the tidyverse() library of R packages (Wickham et al. 2019), which includes the ggplot2() package (Wickham 2016), the dplyr() package (Wickham et al. 2022), and the plotly() package (Sievert 2020)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   1.0.1 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.5.0 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## 
## Attaching package: 'plotly'
## 
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## 
## The following object is masked from 'package:graphics':
## 
##     layout

Loading the data

Next we load our .csv data into R as a tibble object. The tibble has four columns: “data_souce, estimate_method, estimated_tfr, estimated_year”

data = read_csv("data/tfr_estimates.csv") %>%
  as_tibble()
## Rows: 549 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): data_source, estimate_method
## dbl (2): estimated_tfr, estimated_year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Plot Scatter

Finally, we use ggplot2() to compute a static data visualisation

# create the scatter plot
p = ggplot(data, aes(x = estimated_year,
                 y = estimated_tfr,
                 color = interaction(estimate_method,
                                     data_source))) +
  geom_point() +
  geom_line() +
  labs(x = "Estimated Year",
       y = "Estimated TFR",
       title = "Scatterplot of TFR Estimates of Palau 1950-2020",
       color = "") +
  theme_minimal() +
  theme(legend.position = "bottom",
        legend.box="vertical", legend.margin=margin()) +
  guides(color=guide_legend(nrow=15, byrow=TRUE))

p

Make interactive

Using the ggplotly package , we can make the plot

# gonvert the ggplot to plotly
interactive = ggplotly(p) %>% layout(showlegend = FALSE)

interactive

Bibliography

Sievert, Carson. 2020. “Interactive Web-Based Data Visualization with r, Plotly, and Shiny.” https://plotly-r.com.
United Nations. 2022. “World Population Prospects - Population Division - United Nations.” https://population.un.org/wpp/.
Wickham, Hadley. 2016. “Ggplot2: Elegant Graphics for Data Analysis.” https://ggplot2.tidyverse.org.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse 4: 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2022. “Dplyr: A Grammar of Data Manipulation.” https://CRAN.R-project.org/package=dplyr.